AITopics | Taiyuan

With the global energy transition and rapid development of renewable energy, the scheduling optimization challenge for combined power-heat systems under new energy integration and multiple uncertainties has become increasingly prominent. Addressing this challenge, this study proposes an intelligent scheduling method based on the improved Dual-Delay Deep Deterministic Policy Gradient (PVTD3) algorithm. System optimization is achieved by introducing a penalty term for grid power purchase variations. Simulation results demonstrate that under three typical scenarios (10%, 20%, and 30% renewable penetration), the PVTD3 algorithm reduces the system's comprehensive cost by 6.93%, 12.68%, and 13.59% respectively compared to the traditional TD3 algorithm. Concurrently, it reduces the average fluctuation amplitude of grid power purchases by 12.8%. Regarding energy storage management, the PVTD3 algorithm reduces the end-time state values of low-temperature thermal storage tanks by 7.67-17.67 units while maintaining high-temperature tanks within the 3.59-4.25 safety operating range. Multi-scenario comparative validation demonstrates that the proposed algorithm not only excels in economic efficiency and grid stability but also exhibits superior sustainable scheduling capabilities in energy storage device management.

algorithm, artificial intelligence, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2511.1525

Country:

Asia > China > Shanxi Province > Taiyuan (0.05)
Asia > Middle East > Iran > Tehran Province > Tehran (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)

Genre: Research Report > New Finding (0.34)

Industry:

Energy > Power Industry (1.00)
Energy > Renewable > Wind (0.85)
Energy > Renewable > Solar (0.70)
Energy > Renewable > Geothermal > Geothermal Energy Systems and Facilities (0.36)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Pragmatic Theories Enhance Understanding of Implied Meanings in LLMs

Sato, Takuma, Kawano, Seiya, Yoshino, Koichiro

arXiv.org Artificial IntelligenceNov-20-2025

The ability to accurately interpret implied meanings plays a crucial role in human communication and language use, and language models are also expected to possess this capability. This study demonstrates that providing language models with pragmatic theories as prompts is an effective in-context learning approach for tasks to understand implied meanings. Specifically, we propose an approach in which an overview of pragmatic theories, such as Gricean pragmatics and Relevance Theory, is presented as a prompt to the language model, guiding it through a step-by-step reasoning process to derive a final interpretation. Experimental results showed that, compared to the baseline, which prompts intermediate reasoning without presenting pragmatic theories (0-shot Chain-of-Thought), our methods enabled language models to achieve up to 9.6\% higher scores on pragmatic reasoning tasks. Furthermore, we show that even without explaining the details of pragmatic theories, merely mentioning their names in the prompt leads to a certain performance improvement (around 1-3%) in larger models compared to the baseline.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2510.26253

Country:

North America > United States > Florida > Miami-Dade County > Miami (0.04)
North America > Canada > Ontario > Toronto (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(4 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Leisure & Entertainment > Sports > Motorsports > IndyCar (0.40)
Leisure & Entertainment > Sports > Motorsports > Formula One (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Adaptive Redundancy Regulation for Balanced Multimodal Information Refinement

Yang, Zhe, Li, Wenrui, Chen, Hongtao, Wang, Penghong, Xiong, Ruiqin, Fan, Xiaopeng

arXiv.org Artificial IntelligenceNov-19-2025

Abstract--Multimodal learning aims to improve performance by leveraging data from multiple sources. During joint multi-modal training, due to modality bias, the advantaged modality often dominates backpropagation, leading to imbalanced optimization. Existing methods still face two problems: First, the long-term dominance of the dominant modality weakens representation-output coupling in the late stages of training, resulting in the accumulation of redundant information. Second, previous methods often directly and uniformly adjust the gradients of the advantaged modality, ignoring the semantics and directionality between modalities. T o address these limitations, we propose Adaptive Redundancy Regulation for Balanced Multimodal Information Refinement (RedReg), which is inspired by information bottleneck principle. Specifically, we construct a redundancy phase monitor that uses a joint criterion of effective gain growth rate and redundancy to trigger intervention only when redundancy is high. Furthermore, we design a co-information gating mechanism to estimate the contribution of the current dominant modality based on cross-modal semantics. When the task primarily relies on a single modality, the suppression term is automatically disabled to preserve modality-specific information. Finally, we project the gradient of the dominant modality onto the orthogonal complement of the joint multi-modal gradient subspace and suppress the gradient according to redundancy. Experiments show that our method demonstrates superiority among current major methods in most scenarios. Ablation experiments verify the effectiveness of our method. The code is available at https://github.com/xia-zhe/RedReg.git Index T erms--Multimodal learning, modality imbalance, information bottleneck This work was supported in part by the National Key R&D Program of China (2023YFA1008501) and the National Natural Science Foundation of China (NSFC) under grant 624B2049 and U22B2035. Wenrui Li, Penghong Wang, and Xiaopeng Fan are with the Department of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China, and also with Harbin Institute of Technology Suzhou Research Institute, Suzhou 215104, China. Hongtao Chen is with the School of Mathematical Sciences, University of Electronic Science and Technology of China, Chengdu, Sichuan 611731, China (e-mail: ht166chen@163.com).

artificial intelligence, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

2511.13755

Country:

Asia > China > Heilongjiang Province > Harbin (0.65)
Asia > China > Sichuan Province > Chengdu (0.24)
Asia > China > Hong Kong (0.04)
(7 more...)

Genre: Research Report > New Finding (0.68)

Industry:

Leisure & Entertainment (0.92)
Media (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(3 more...)

Add feedback

Decomposing and Revising What Language Models Generate

Yan, Zhichao, Chen, Jiaoyan, Wang, Jiapu, Li, Xiaoli, Li, Ru, Pan, Jeff Z.

arXiv.org Artificial IntelligenceNov-17-2025

Attribution is crucial in question answering (QA) with Large Language Models (LLMs).SOTA question decomposition-based approaches use long form answers to generate questions for retrieving related documents. However, the generated questions are often irrelevant and incomplete, resulting in a loss of facts in retrieval.These approaches also fail to aggregate evidence snippets from different documents and paragraphs. To tackle these problems, we propose a new fact decomposition-based framework called FIDES (\textit{faithful context enhanced fact decomposition and evidence aggregation}) for attributed QA. FIDES uses a contextually enhanced two-stage faithful decomposition method to decompose long form answers into sub-facts, which are then used by a retriever to retrieve related evidence snippets. If the retrieved evidence snippets conflict with the related sub-facts, such sub-facts will be revised accordingly. Finally, the evidence snippets are aggregated according to the original sentences.Extensive evaluation has been conducted with six datasets, with an additionally proposed new metric called $Attr_{auto-P}$ for evaluating the evidence precision. FIDES outperforms the SOTA methods by over 14\% in average with GPT-3.5-turbo, Gemini and Llama 70B series.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.3233/FAIA413

2509.00765

Country:

Asia > Singapore (0.04)
Asia > China > Beijing > Beijing (0.04)
North America > United States > Oregon (0.04)
(3 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.77)

Add feedback

Semantic4Safety: Causal Insights from Zero-shot Street View Imagery Segmentation for Urban Road Safety

Chen, Huan, Han, Ting, Chen, Siyu, Guo, Zhihao, Chen, Yiping, Wu, Meiliu

arXiv.org Artificial IntelligenceNov-14-2025

Street-view imagery (SVI) offers a fine-grained lens on traffic risk, yet two fundamental challenges persist: (1) how to construct street-level indicators that capture accident-related features, and (2) how to quantify their causal impacts across different accident types. To address these challenges, we propose Semantic4Safety, a framework that applies zero-shot semantic segmentation to SVIs to derive 11 interpretable streetscape indicators, and integrates road type as contextual information to analyze approximately 30,000 accident records in Austin. Specifically, we train an eXtreme Gradient Boosting (XGBoost) multi-class classifier and use Shapley Additive Explanations (SHAP) to interpret both global and local feature contributions, and then apply Generalized Propensity Score (GPS) weighting and Average Treatment Effect (ATE) estimation to control confounding and quantify causal effects. Results uncover heterogeneous, accident-type-specific causal patterns: features capturing scene complexity, exposure, and roadway geometry dominate predictive power; larger drivable area and emergency space reduce risk, whereas excessive visual openness can increase it. By bridging predictive modeling with causal inference, Semantic4Safety supports targeted interventions and high-risk corridor diagnosis, offering a scalable, data-informed tool for urban road safety planning.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3764912.3770830

2510.15434

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.16)
North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > New York > New York County > New York City (0.04)
(5 more...)

Genre: Research Report > New Finding (1.00)

Industry: Transportation > Ground > Road (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

PITE: Multi-Prototype Alignment for Individual Treatment Effect Estimation

Cao, Fuyuan, Zhang, Jiaxuan, Li, Xiaoli

arXiv.org Artificial IntelligenceNov-14-2025

Estimating Individual Treatment Effects (ITE) from observational data is challenging due to confounding bias. Most studies tackle this bias by balancing distributions globally, but ignore individual heterogeneity and fail to capture the local structure that represents the natural clustering among individuals, which ultimately compromises ITE estimation. While instance-level alignment methods consider heterogeneity, they similarly overlook the local structure information. To address these issues, we propose an end-to-end Multi-\textbf{P}rototype alignment method for \textbf{ITE} estimation (\textbf{PITE}). PITE effectively captures local structure within groups and enforces cross-group alignment, thereby achieving robust ITE estimation. Specifically, we first define prototypes as cluster centroids based on similar individuals under the same treatment. To identify local similarity and the distribution consistency, we perform instance-to-prototype matching to assign individuals to the nearest prototype within groups, and design a multi-prototype alignment strategy to encourage the matched prototypes to be close across treatment arms in the latent space. PITE not only reduces distribution shift through fine-grained, prototype-level alignment, but also preserves the local structures of treated and control groups, which provides meaningful constraints for ITE estimation. Extensive evaluations on benchmark datasets demonstrate that PITE outperforms 13 state-of-the-art methods, achieving more accurate and robust ITE estimation.

artificial intelligence, data mining, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2511.1032

Country:

Asia > Singapore (0.04)
Asia > China > Shanxi Province > Taiyuan (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Data Science > Data Mining (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)

Add feedback

SeeingSounds: Learning Audio-to-Visual Alignment via Text

Carnemolla, Simone, Pennisi, Matteo, Russo, Chiara, Palazzo, Simone, Giordano, Daniela, Spampinato, Concetto

arXiv.org Artificial IntelligenceOct-15-2025

We introduce SeeingSounds, a lightweight and modular framework for audio-to-image generation that leverages the interplay between audio, language, and vision-without requiring any paired audio-visual data or training on visual generative models. Rather than treating audio as a substitute for text or relying solely on audio-to-text mappings, our method performs dual alignment: audio is projected into a semantic language space via a frozen language encoder, and, contextually grounded into the visual domain using a vision-language model. This approach, inspired by cognitive neuroscience, reflects the natural cross-modal associations observed in human perception. The model operates on frozen diffusion backbones and trains only lightweight adapters, enabling efficient and scalable learning. Moreover, it supports fine-grained and interpretable control through procedural text prompt generation, where audio transformations (e.g., volume or pitch shifts) translate into descriptive prompts (e.g., "a distant thunder") that guide visual outputs. Extensive experiments across standard benchmarks confirm that SeeingSounds outperforms existing methods in both zero-shot and supervised settings, establishing a new state of the art in controllable audio-to-visual generation.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3743093.3770976

2510.11738

Country:

Asia > Malaysia > Kuala Lumpur > Kuala Lumpur (0.05)
Europe > Italy (0.05)
Europe > Switzerland (0.04)
(3 more...)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.54)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

MultiFinBen: Benchmarking Large Language Models for Multilingual and Multimodal Financial Application

Peng, Xueqing, Qian, Lingfei, Wang, Yan, Xiang, Ruoyu, He, Yueru, Ren, Yang, Jiang, Mingyang, Zhang, Vincent Jim, Guo, Yuqing, Zhao, Jeff, He, Huan, Han, Yi, Feng, Yun, Jiang, Yuechen, Cao, Yupeng, Li, Haohang, Yu, Yangyang, Wang, Xiaoyu, Gao, Penglei, Lin, Shengyuan, Wang, Keyi, Yang, Shanshan, Zhao, Yilun, Liu, Zhiwei, Lu, Peng, Huang, Jerry, Wang, Suyuchen, Papadopoulos, Triantafillos, Giannouris, Polydoros, Soufleri, Efstathia, Chen, Nuo, Deng, Zhiyang, Fu, Heming, Zhao, Yijia, Lin, Mingquan, Qiu, Meikang, Smith, Kaleb E, Cohan, Arman, Liu, Xiao-Yang, Huang, Jimin, Xiong, Guojun, Lopez-Lira, Alejandro, Chen, Xi, Tsujii, Junichi, Nie, Jian-Yun, Ananiadou, Sophia, Xie, Qianqian

arXiv.org Artificial IntelligenceOct-14-2025

Real-world financial analysis involves information across multiple languages and modalities, from reports and news to scanned filings and meeting recordings. Yet most existing evaluations of LLMs in finance remain text-only, monolingual, and largely saturated by current models. To bridge these gaps, we present MultiFinBen, the first expert-annotated multilingual (five languages) and multimodal (text, vision, audio) benchmark for evaluating LLMs in realistic financial contexts. MultiFinBen introduces two new task families: multilingual financial reasoning, which tests cross-lingual evidence integration from filings and news, and financial OCR, which extracts structured text from scanned documents containing tables and charts. Rather than aggregating all available datasets, we apply a structured, difficulty-aware selection based on advanced model performance, ensuring balanced challenge and removing redundant tasks. Evaluating 21 leading LLMs shows that even frontier multimodal models like GPT-4o achieve only 46.01% overall, stronger on vision and audio but dropping sharply in multilingual settings. These findings expose persistent limitations in multilingual, multimodal, and expert-level financial reasoning. All datasets, evaluation scripts, and leaderboards are publicly released.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2506.14028

Country: